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University  of  Hawaii 

1.  Introduction. 

The  problem  considered  is  that  of  discriminating  between  stationary 
Gaussian  random  processes.  That  is,  assume  that  a sample  function 
y = (y (l)^ • • • ^y (n) ) is  hypothesized  to  be  a sample  function  from  one  of 
two  alternative  processes,  with  each  process  characterized  as  a zero 
mean  covariance  matrix  function  stationary  Gaussian  process.  The  objective 
is  to  determine  the  structure  of  the  minimum  classification  error  decision 
procedure  and  to  compute  the  probability  of  misclassification.  This  work 
was  primarily  stimulated  by  a paper  by  Grenander  [l]  in  vdiich  it  was 
demonstrated  that  for  scalar  processes  the  probability  of  classification 
error  decreased  geometrically  with  n the  number  of  observations.  That 
result  was  achieved  by  Laplace's  method  for  the  evaluation  of  integrals 
in  terms  of  the  limiting  distribution  of  the  eigenvalues  of  a block  Toeplitz 
matrix.  The  results  in  this  paper  are  achieved  by  elementary  moment 
analysis  methods. 


* 

Supported  in  part  by  NSF  Grant  ENG  74-09883  at  the  University  of  Hawaii. 
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2.  Analysis. 


1.  Structure  of  the  decision  procedure:  Assiime  that  the 

observation  matrix  y = (y (l)j . . . ,y (n))  of  the  d-component  vectors 
y (j i = Ij • • • j n is  a sample  function  from  one  of  two  alternative 
finitely  generated  zero  mean  stationary  completely  non-deterministic 
random  processes.  Introduce  the  notation,  m = 1,2,  to  denote 

the  alternative  hypotheses  and  assume  that  the  a priori  probability 
of  and  are  equal*  Then,  the  minimum  probability  of  error 

decision  procedure  to  distinguish  the  alternative  hypotheses  is  to 
compute  2 x log -likelihood  ratio  (LR)  statistic  L^(y), 

L„(y)  = (1) 

f (y)  f^(y(l)---y(n)) 

and  to  decide  if  L^(y)  > 0 and  otherwise.  In  Equation  (l) 

the  superscripted  quantities  f”^(y)  denote  the  probability  density 
function  of  the  observation  vector  y under  the  ra-th  hypothesis. 

In  general  the  n vector  of  dependent  normal  observations 

V . 

y = (y (l), . , . ,y (n) ) can  be  orthogonalized  to  the  form 

f(y(l),,..,y(n))  = f (e(l), . * . ,e(n) ) == 

nd  1 

= (2«)'  IT  II  r ® exp-f-  i(e-  (t)  I -h(t)}]. 

t=l  t t ' 

In  Equation  (2)  and  the  following  unless  otherwise  identified,  lower 
case  letters  denote  vectors  and  upper  case  letters  denote  matrices. 
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an  apostrophe  indicates  matrix  transposition  and  |a|  denotes  the  deter- 
minant of  matrix  A,  A Gram-Schmidt  procedure  is  one  satisfactory  way  to 
accomplish  the  orthogonal! zatlon.  Inverse  filtering  and  least  squares 
prediction  methods  are  alternative  interpretations  of  the  orthogonalizatlons 
[2].  At  this  point  it  is  helpful  to  interpret  Equation  (2)  in  the  light 
of  some  more  particular  structure  and  notation. 

Using  the  Wold  representation  theorem,  the  assumption  that  y(t) 
is  completely  non-deterministic  is  equivalent  to  the  statement  that 
it  can  be  represented  as 

CO 

y(t)  = £ h e(t-i)  , E[e(t)]  = 0 , E[€(t)e»(s)]  = V6,  . 

i=0 

(3) 

That  is,  the  {y(t)}  process  can  be  thought  of  as  the  output  of  a 
linear  filter  with  matrix  impulse  response  {h^}  whose  input  is  the 
stationary  zero  mean  uncorrelated  sequence  [e(t))o  Corresponding  to 
the  y(t)  process,  a zero  mean  "residual"  process  {e(t)}  can  be 
identified  as  the  process  that  is  produced  by  a filter  inverse  to 
{h  } acting  on  the  y(t)  process.  Equivalently  [e(t))  can  be 
interpreted  as  the  process  representing  the  difference  between  y(t) 
and  its  least  squares  estimate  y(t).  Let  the  sequence  e(l), , , . ,e(n) 
in  Equation  (2)  be  a finite  sample  function  of  the  e(t)  process.  In 
general  as  identified  in  Equation  (2)  the  sequence  e (l), , . . , e (n)  need 
not  have  stationary  covariances. 

Identify  the  Wold,  moving  average  (MA)  or  innovations  representa- 
tion of  Equation  (3)  for  the  different  classes  of  processes  considered 
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in  the  operator  notation 


y“(t)  = (h“^)  e”^(t)  , E[e“^(t)e”^(s)‘]  =.  m = 1,2  , (4) 

Then  from  the  inverse  filter  point  of  view,  in  operator  notation  the 
inverse  of  the  k=th  process,  k = 1,2,  may  be  written 

e^(t)  = (h^)  y(t)  = (A^)  y(t)  k = 1,2  . (5) 

(The  assmptlon  that  y(t)  is  finitely  generated  implies  that  the 
filter  can  be  thought  of  as  computed  from  a realization  of  the  y(t) 
unknown  parameters  of  the  y(t)  process,  ) Now  consider  the  least 
squares  estimator  interpretation  of  the  orthogonalization  of  the  y 
sequence.  It  is  known  that  y(t)  the  least  squares  estimator  of 
y(t)  is  the  conditional  mean  of  y(t)  given  the  past  of  y(t)  and 
that  for  Gaussian  processes  the  estimator  is  a linear  function  of  the 
data.  Thus 

00 

y(t)  = E[y(t)  |y(t==l),, , . } = ~ Z A.  y(t-i)  (6) 

i=l 

where  {A^}  are  dxd  coefficient  matrices.  Thus  identify  the  residual 
process  e(t)  by 

00 

e(t)  = y(t)-y(t)  = Z A.  y(t~i)  , A = I , (7) 

i=0  ^ ° 

The  infinite  autoregressive  (AR)  representation  of  the  residual  process 
in  Equations  (7)  and  (5)  as  the  inverse  filter  is  purely  formal  and 
extremely  convenient.  It  does  not  exclude  alternative  Markovian  or 
Kalman  filter  type  interpretations,  litien  the  y(t)  data  sequence 
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is  finite^  the  AR  coefficient  matrices  become  functions  of  time, 


As  Grenander,  our  concern  is  with  an  asymptotic  theory.  Thus  as 

n increases  and  the  effects  of  initial  conditions  diminish,  the 

residual  process  tends  toward  a stationary  process.  That  is,  the 

time  indexed  coefficient  matrices  A^^  and  tend  to  the  constant 

matrices  A^  and  V respectively.  Applying  the  forgoing  considera- 

1 2 

tions  to  the  distribution  of  f (y),  f (y)  in  the  likelihood  ratio 
statistic  obtain; 

Lemma  1.  The  asymptotic  log-likelihood  ratio  statistic  for 
distinguishing  between  alternative  stationary  Gaussian  time  series 
is 


L (y)  = 2in  ~ 

^ r(y) 


= n log 


V. 


n 


-1  1, 


+ ^ e"^(t)'v"V(t)  - ^ e^(t)>V”-"e-^(t) 

t=l  t=l  ^ 


(8) 


In  Equation  (8)  e’^(t)  is  the  residual  at  time  t obtained  by 

filtering  the  observed  y(t)  process  with  the  m-th  model  Inverse 
filter,  m = 1,2.  The  optimal  decision  procedure  thus  involves  the 
application  of  an  inverse  of  filter  to  the  observed  data  sequence 
y (l), . ,y(n).  Calculation  of  the  LR  statistic  in  Equation  (8)  and 
application  of  the  decision  rule,  decide  K if  L (y)  > 0 and 
decide  otherwise.  The  optimum  decision  procedure  structure  is 

illustrated  in  Figure  1. 


5 


2. 


Statistics  of  the  Log -Likelihood  ratio  test; 

Lemma  2,  The  log-likelihood  ratio  test  satistic  is  asymptotically 

normally  distributed  with  conditional  means  and  variances  q , cr^ 

^m  m 

m = 1^2; 

Ln(y)|H^~n(p^,4)  , m = l,2.  (9) 

(Explicit  formulas  for  cr^  are  in  Lemmas  3 ajid  4). 

Proof.  The  LR  statistic^  Equation  (8)  is  the  sum  of  sums  of 
dependent  quadratic  terms.  Under  conditions  on  the  rate  of  decay  of 
dependence,  that  are  difficult  to  ascertain^  each  of  the  sum  of  terms 
tends  toward  a normal  distribution.  On  the  other  hand^  each  of  the  sum 
of  terms  is  in  the  form 


2 e(t)»v"^e(t)  = I tr(e(t)e(t)’V~^)  = tr(  7 e (t )e (t ) ' )v"^ 
t=l  t=l  tTi 

= n tr[Cee (O )V~^] 


(10) 


In  Equation  (10)  tr[A]  is  the  trace  of  matrix  A,  tr[^]  = tr[BA], 
Cee(O)  is  the  sample  covariance  matrix  of  a residual  process  calculated 
at  zero  lag  and  V is  a theoretical  (constant)  residual  variance  matrix. 
Based  upon  the  earlier  work  of  Diananda  [3]  and  Walker  [4]  and  following 
Hannan  [5]  and  Anderson  [6]  the  elements  of  Cee(O)  are  asymptotically 
normally  distributed.  Thus  Cee(O)  and  Gee  have  multivariate 

normal  distributions  and  asymptotically  tr  (Gee  (o)?”'^)  has  a limiting 
scalar  distribution.  Following  Hannan,  a sufficient  condition  for 
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normality  of  the  sample  covariance  is  that  if  {h^}  is  the  impulse 
response  matrix  of  the  e(t)  process  that  ^ * Explicit 

formula  for  {h,  } are  given  following  Lemma  3 and  this  condition  is 

Ij 

seen  to  be  trivially  satisfied.  Lemma  2 expresses  the  situation  of 
preimary  interest,  the  distribution  of  the  LR  statistic  under  the 
alternative  hypotheses.  The  conditional  means  and  variances  of  the 
limiting  distribution  of  the  log -likelihood  ratio  statistic  are  computed 
next. 


Lemma  Time  and  frequency  domain  formula  for  the  conditional 
mean  and  variance  of  the  LR  statistic  are: 


E[L^(y)|H,  ] =2/  dy  = ^ 2n  l(l;2) 

n 1 f2(y)  1 


= n 


= n 


n 


= n 


IVgl 

ri/2 

in  + tr 

|v^| 

-^-1/2 

' IvJ 

" 00 

2 ' 

in  — ^ — + tr 

I (h‘ 

L l\l 

Lj=0 

/ f^(y)in  ™ 

hi  ay 

(y) 

IVgl 

ri/2 

in  - tr 

Iql 

^-1/2 

i v„| 

r 00 

in  — — - tr 

I 0 

y=o 

y definition. 

1(1: 

S^(f)S2(f)“^df-d 


)V“^ 

1 .1  2 


-d 


(a) 


(b) 


S2(f)S^(f)"  df  + d 


+ d 


(c) 


(d) 


(11) 


measure  of  the  amount  of  information  per  observation  to  decide 
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instead  of  when  is  true^  power 

spectral  densities  of  the  {y^(t))  and  [y^(t)}  process  respec- 
tively and  h^^^  is  the  impulse  response  of  the  residual  e^'’”^(t) 

/,  k,m  .Bi  k, 

process,  (h  -Ah)  the  process  formed  by  acting  on  the  process 
k,  , 

y (t)  with  the  model  corresponding  to  the  inverse  of  the  m-th 
process.  Hie  time  and  frequency  domain  formulas  in  Equation  (ll) 


are  new. 

Consider  the  conditional  expectations  of  each  of  the  terms 
in  Equation  (8)  separately,  Olien, 


n 


-1  1 


n 


E[  X (t)v:^e"-(t)|H.  ] = tr[(  X e^(t)e^(t)’ IhJV.7^]  = nd  , (l2) 


t=l 


because 


1 


n 


t=l 


1^1 


E(  Y e^(t)e^(t)'  |h,  ) = n E[Ce^e^(0)]  = nV^  , 


1 1, 


t=l 


Then,  too. 


n 

z 

t=l 


E[  X (t)v"^e^(t)|H  ] = tr  E[(  X (t  )e^  (t )’ |h  Jv'^ ] . (13) 

t=l 


Eow  by  definition  of  the 


e^(t) 


process 


e^(t)|H^  = (A^)y^(t)  = (AV)e^(t)  = (h^^^)e^(t) 

(14) 

00 

= X e^(t-i)  . 

i=0 
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Direct  expansion  of  the  operator  expressions  in  Equation  (l4 ) yields 
the  result 


, k.m  r,  k 


I 

i=l 


A"hJ 

X t- 


.ij 


= 0,1, 


(15) 


Digression:  It  is  convenient  here  to  think  of  as  the  AE  process 

coefficient  matrices  in  the  representation  of  y”^(t).  Thus  we  observe 
that  hQ^“^  = Z,  h^"^  = IS^  and  Var  (e^"“^(t ) ) > Var'(e^"^(t ) ) with 
equality  if  and  only  if  m = 1.  These  conclusions  axe  a direct  conse- 
quence of  the  relationship  between  the  AR  and  MA  representations 
of  a stationary  process.  Thus  in  the  AR  and  MA  representations 


y^(t)  ^ (h^)e^(t)  , (A^)y^(t)  = £^(t)  . (l6) 


From  this  we  obtain  the  useful  recursive  relationships  for 


(a  ) (h  ) = I ^ h^  “ " ■^i^t-i  ^ ^ ~ 1^2^,.. 

i=l 

00 

hw.kN  ^ .k  y.  _ T 

(h  ) (a  ) — I j A^  — = ^ h a } h^  A^  I . 


(IT) 


Returning  to  the  proof  of  the  lemma^ 
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= n tr[  (h^V^h^ ' ){F  V“\“ ) ] (^8 ) 

rl/2 

= n tr[  S (f)S  (f)  ^df] 

J-l/2  ^ 2 

In  Equation  (l8)  we  made  repeated  use  of  the  matrix  interchange  property 
of  the  trace  operation^  and  the  fact  that  for  stationary  process  the 
covariance  function  and  spectral  densities  are  Fourier  transform  pairs. 

We  also  used  the  formiilas  for  the  spectral  density  of  a process 
S(f)  = hVh  ' = A ^VA  where  h and  A are  the  usual  polynomial  matrix 
frequency  domain  operators,  [2]  and  the  dagger  j denotes  complex 
conjugate  transpose.  The  combination  of  the  methods  and  results  in 
Equations  (l3)-(l8)  yield  the  frequency  domain  formulas  for  the  condi- 
tional mean  LR  statistic  in  Equation  (lla  and  11c )»  Shumway  and  Unger 
[8]  proved  a result  similar  to  Equation  (l6)  for  the  scalar  case  via  a 
limiting  distribution  of  the  eigenvalues  of  a Toeplitz  matrix  developments 
The  frequency  domain  formulas  for  the  Kullback-Liebler  numbers. 
Equations  (lla  and  11c)  may  be  computed  by  a variety  of  techniques. 

The  corresponding  time  domain  formulas.  Equations  (lib  and  lid)  are 
very  satisfactorily  computed  by  an  approximating  sum.  The  time  domain 
formulas  may  be  obtained  directly  starting  with  Equation  (l4).  Using  the 
matrix  interchange  property  of  the  trace,  the  formula  in  Equation  (l3 ) 
for  e^(t)|H^,  and  the  fact  that  E[e^(t)e^(t)'  jH^]  = V^, 
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(19) 


E[  7 e^’ (t)v:V(t)|H  ] = I tr  E(e^^^(t)V  (t))v“^ 

t-1  t=l 


, r ^ A ,1,2'  , -In 

n tr[  ^ (h  / V li  ^ )V  j 
J=0  ^ ^ ^ 


A side  point  is  that  a useful  interpretation  of  the  Kullback-Liebler 
measure,  taken  directly  from  the  definition  in  Equation  (ll)  is. 


f^(y(l)j  * • « (n)  |h^)  = exp  -nl(l:2)f^(y  (l), . c . ,y  (n)  |h^)  (20) 

That  is,  the  value  of  the  likelihood  (incremental  probability)  of  the 
observed  sequence  y (l), . . • ,y (n)  under  the  assumption  of  model  2 when 
process  1 is  true,  is  on  the  average  exp  -nl(l:2)  times  the  value  of 
the  likelihood  of  that  sequence  under  the  assumption  of  model  1,  the 
time  model  of  the  observed  process, 

LEMMA  The  conditional  variances  of  the  LR  statistics  are 


(21) 

p p 

ff  = 2n[d  + ^ (1- 

r=0 

where 


£){tr2(r2^^)+tr^(r^J,l)~dtr(h2^^)tr(h;^^  v'^Vg)}] 


0-  = 2n[d  + 7 (1 

r=0 


j,k,m 

r 


CO 


r 

c 2. 

j-o 


h^->h 
D k 


k,m' 

j+r 


A>m  = (rf’“)' 

-r  k 


(22) 
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Proof:  This  lemma  is  a computational  result  that  requires  computation 

of  the  conditional  variance  of  the  s-ommed  terms  in  the  LR  statistic 
in  Equation  (8)  and  the  covariance  of  those  terms.  These  are  only- 
second  moment  computations,  they  follo-w  the  spirit  and  the  technique 
of  the  computations  for  the  previous  lemma.  Therefore  the  obvious 
algebraic  details  are  omitted.  Consider  computation  of  the  conditional 
variance  of  the  first  summed  term  in  Equation  (8).  Using  the  formula 
Var(X)  = E[X^]-E^[X]  we  write 

E[  2 e^'(t)v'V(t)|K]  = E[  f i E Z e (t)S  (t)e  (s)e  (s)]. 
t=l  ^ ^ r=l  J=1  t=l  s=l  ^ ^ ^ 

(23) 

To  obtain  Equation  (23 ) we  employ  the  upper-lower  triangular  factorization 
of  Vg  = L(V2)U(V2)^  = U ^(V2)lZ'('V2)  and  identify  the  components 

e^(t)  via  the  multiplication  e^' (t)u”^  (Vg)  = [e^(t), . ,e^(t)] . Then, 
recall  the  known  result  for  normally  distributed  random  variables 


E[X^X2X^X^]  = E[X^X2]E[X^X^] 


+ E[X^X^]E[XgX^; 


+ E[X^X^]E[X2X  ] 


(24) 


Apply  this  result  in  Equation  (23),  recombine  terms  and  employ  direct 
evaluation  to  obtain 


= 2 £ E (tr  E[e^(s)e^(t)^v"^])^ 
t=l  s=l 

(25) 

= 2n  i (l-|){tr2(rJ;^2)Hr2(r^^2^} 

r=l  ^ 
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where 


are  as  defined  in  Equation  (23).  Similarly^ 


r ^ -r 


Var  ( X (t ) » V"^e^  (t ) | H ) = 2 X X (tr  E[ e^  (s  )e^  (t ) ’ Y~ 
t=l  + t=l  s=l 


2nd^  . 


The  covariance  terms  may  be  evaluated  as 


(26) 


^ 2>,,  ^„-12,,  , ^ 1, 


Cov(  X (t)v“'^e^(t)  £ e‘^(s)*V“^e^(s)|H  ! 

t=l  ^ s=l  ^ 


n n 

z z 

t=l  s=l 


= 2 X XsE[e^(s)e^(t)»h^^^'v"^]E[h^^^e^(t)e^(s)’V"^] 


(2?: 


n-1 


= 2nd  X (1- j)tr(h^^^)tr(h^^^’v"^V^) 


r=0 


THEOREM.  The  asymptotic  probability  of  classification  error  between 
stationary  Gavissian  processes  is  bounded  exponentially  with  n,  the 
number  of  observations. 

Proof r-  Ry  virtue  of  the  fact  that  the  LR  statistic  is  asymptotically 
normal^  consider  direct  evaluation  of  classification  error  under 
Assume  jj.^  > ^2*  Then 


P (error  I ' 
n^  ' 1' 


hj^(y)|Hi  dy  = 'K(-  ^)  . 


(28) 


In  Equation  (28)^  by  elementary  considerations  > 0,  Since  and 

2 ^2- 
cr  are  linearly  proportional  to  n,  CL  ^ = -/n  c,  > 0»  Thus 


4 


(29) 


= i-4(a)  < — = — 


exp 


The  bound  is  obtained  by  integration  by  parts  (i.e,,  Mills’  ratio). 

Similar  result?  are  obtained  for  P (error |h„)  and  the  alternative 

n c. 

assumption  that  < Hg, 


Discussion. 

This  paper  presents  a general  result  on  the  asymptotic  theory  of 
the  discrimination  between  fliultivariate  stationary  Gaussian  random 
processes.  The  result  that  the  asymptotic  probability  of  error  bounded 
exponentially  with  n is  stronger  than  that  achieved  by  Grenander. 

Also  important,  this  analysis  readily  suggests  implementations  to 
achieve  minimum  classification  error  between  stationary  random  processes. 
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